Conserved taxonomy assignment

Methods:

  • Phylotype method at genus level.
  • OTU-based method with minimum set at 97% identity. The opticlust algorithm was used to optimize the quality of assigned taxonomy.
  • Phylogeny method where tree was taxonomically classified with minimum set at 97% identity


Quality of conserved taxonomy

The opticlust algorithm yielded high quality OTUs based on the high precision metrics below.


Table x: Statistical parameters calculated from the OTU-based classification method
Parameter Value
cutoff 0.2
Sensitivity 0.998
Specificity 0.999
PPV 0.998
NPV 0.999
Accuracy 0.999
MCC 0.997
F1score 0.998
FDR 0.002

Overall OTU abundance

                 Phylum frequency percentage cumulative_perc
1            Firmicutes        42      66.67           66.67
2        Actinobacteria         5       7.94           74.61
3         Bacteroidetes         5       7.94           82.55
4        Proteobacteria         5       7.94           90.49
5           Tenericutes         2       3.17           93.66
6 Bacteria_unclassified         1       1.59           95.25
7   Deinococcus-Thermus         1       1.59           96.84
8       Patescibacteria         1       1.59           98.43
9       Verrucomicrobia         1       1.59          100.00

                         Class frequency percentage cumulative_perc
1                   Clostridia        31      49.21           49.21
2                      Bacilli         8      12.70           61.91
3                  Bacteroidia         5       7.94           69.85
4          Gammaproteobacteria         5       7.94           77.79
5               Actinobacteria         2       3.17           80.96
6               Coriobacteriia         2       3.17           84.13
7             Erysipelotrichia         2       3.17           87.30
8                   Mollicutes         2       3.17           90.47
9  Actinobacteria_unclassified         1       1.59           92.06
10       Bacteria_unclassified         1       1.59           93.65
11                  Deinococci         1       1.59           95.24
12     Firmicutes_unclassified         1       1.59           96.83
13             Saccharimonadia         1       1.59           98.42
14            Verrucomicrobiae         1       1.59          100.00

                              Order frequency percentage cumulative_perc
1                     Clostridiales        30      47.62           47.62
2                     Bacteroidales         4       6.35           53.97
3                   Lactobacillales         4       6.35           60.32
4                        Bacillales         3       4.76           65.08
5                  Coriobacteriales         2       3.17           68.25
6                Erysipelotrichales         2       3.17           71.42
7       Actinobacteria_unclassified         1       1.59           73.01
8                   Actinomycetales         1       1.59           74.60
9              Bacilli_unclassified         1       1.59           76.19
10            Bacteria_unclassified         1       1.59           77.78
11         Bacteroidia_unclassified         1       1.59           79.37
12            Betaproteobacteriales         1       1.59           80.96
13                Bifidobacteriales         1       1.59           82.55
14          Clostridia_unclassified         1       1.59           84.14
15                    Deinococcales         1       1.59           85.73
16                Enterobacteriales         1       1.59           87.32
17          Firmicutes_unclassified         1       1.59           88.91
18 Gammaproteobacteria_unclassified         1       1.59           90.50
19                  Mollicutes_RF39         1       1.59           92.09
20          Mollicutes_unclassified         1       1.59           93.68
21                  Pseudomonadales         1       1.59           95.27
22                Saccharimonadales         1       1.59           96.86
23    Verrucomicrobiae_unclassified         1       1.59           98.45
24                  Xanthomonadales         1       1.59          100.00

                             Family frequency percentage cumulative_perc
1                   Lachnospiraceae        13      20.63           20.63
2                   Ruminococcaceae        11      17.46           38.09
3                  Clostridiaceae_1         3       4.76           42.85
4               Erysipelotrichaceae         2       3.17           46.02
5                       Family_XIII         2       3.17           49.19
6                  Lactobacillaceae         2       3.17           52.36
7       Actinobacteria_unclassified         1       1.59           53.95
8                  Actinomycetaceae         1       1.59           55.54
9           Bacillales_unclassified         1       1.59           57.13
10             Bacilli_unclassified         1       1.59           58.72
11            Bacteria_unclassified         1       1.59           60.31
12                   Bacteroidaceae         1       1.59           61.90
13       Bacteroidales_unclassified         1       1.59           63.49
14         Bacteroidia_unclassified         1       1.59           65.08
15               Bifidobacteriaceae         1       1.59           66.67
16          Clostridia_unclassified         1       1.59           68.26
17       Clostridiales_unclassified         1       1.59           69.85
18    Coriobacteriales_unclassified         1       1.59           71.44
19                   Deinococcaceae         1       1.59           73.03
20                  Eggerthellaceae         1       1.59           74.62
21               Enterobacteriaceae         1       1.59           76.21
22          Firmicutes_unclassified         1       1.59           77.80
23 Gammaproteobacteria_unclassified         1       1.59           79.39
24     Lactobacillales_unclassified         1       1.59           80.98
25                     Listeriaceae         1       1.59           82.57
26               Mollicutes_RF39_fa         1       1.59           84.16
27          Mollicutes_unclassified         1       1.59           85.75
28                   Muribaculaceae         1       1.59           87.34
29                    Neisseriaceae         1       1.59           88.93
30                 Pseudomonadaceae         1       1.59           90.52
31                    Rikenellaceae         1       1.59           92.11
32               Saccharimonadaceae         1       1.59           93.70
33                Staphylococcaceae         1       1.59           95.29
34                 Streptococcaceae         1       1.59           96.88
35    Verrucomicrobiae_unclassified         1       1.59           98.47
36                 Xanthomonadaceae         1       1.59          100.00

Manual annotation


Figure x: Unfiltered and curated OTU abundance. Visual representation of taxon terms highlight the most abundant taxon based on frequency of being assigned to an OTU or tree nodes. Muribaculaceae is the most frequently assigned family and Muribaculaceae_ge is the most frequent species assigned to most sequences.


Most abundant species (top 15%)



OTUs
 [1] "Otu01" "Otu02" "Otu03" "Otu04" "Otu05" "Otu06" "Otu08" "Otu09"
 [9] "Otu10" "Otu17"

Phylum
[1] "Bacteroidetes" "Firmicutes"   

Class
[1] "Bacteroidia" "Clostridia" 

Order
[1] "Bacteroidales"   "Clostridiales"   "Lactobacillales"

Family
[1] "Bacteroidaceae"  "Lachnospiraceae" "Muribaculaceae"  "Ruminococcaceae"

Genus
[1] "Alistipes"          "Bacteroides"        "Lactobacillus"     
[4] "Mollicutes_RF39_ge" "Muribaculaceae_ge"  "Oscillospira"      
[7] "Turicibacter"      


Example of rank abundance


  • Demonstration using four samples
  • Rank abundance curves can be used to display species richness and species evenness across selected samples.

Figure x. Rank abundance of eight samples. Package: goeveg



Correlation between observed phyla



Figure x. Correlation between species identified at phylum-level. Species are ordered alphabetically (top panel) and heuristically (bottom panel)


Alpha diversity


Species richness at each datapoint

Figure x: Stacked barplots for species richness. The estimated richness (green bars) was calculated using chao calculator and observed ichness (red bars) was calculated using sobs.


Species richness: Boxplots, density plots and histograms

Figure x: Species richness (observed species) displayed by boxplot (A), density plots (B) and histograms (C).


Correlation between species richness and sequence depth


Figure x: Correlation between species richness and sequence depth. Observed species calculated using sobs (A) and estimated species richness by chao calculator (B).


Correlation between species diversity and species richness



Figure x: Species diversity and correlation to species richness. Definitely phylo-diversity (C) correlates well with the species richness.


Correlation between species diversity and species richness

  • Sample size: 360 samples


Example: Figure10 in iMAP paper

Example: Figure10 in iMAP paper

Species accumulation




Individual species accumulation curves using different methods


Overplot graph of species accumulation curves


Rarefaction and Extrapolation (R/E)


Type 1: Sample-size-based R/E curve


Type 2: Sample completeness curve

Figure x: Species diversity estimates as a function of sample size. Only species with abundance greater or equal to 1 are detected in the sample.


Type 3: Coverage‐based R/E curves




Shannon diversity index


Inverse Simpson diversity index



Beta diversity


Heatmaps


Phyla


Class


Order


Family


Genus


PAM clustering



Number of best clusters


OTUbased
Number_clusters     Value_Index 
         3.0000          2.0409 
Phylum
Number_clusters     Value_Index 
          3.000           4.683 
Class
Number_clusters     Value_Index 
         3.0000          4.1005 
Order
Number_clusters     Value_Index 
         3.0000          3.6381 
Family
Number_clusters     Value_Index 
         2.0000          2.7126 
Genus
Number_clusters     Value_Index 
         3.0000          2.4937 



PCoA or PCO: Principal Coordinates Analysis




NMDS (Non-metric multidimensional scaling)


Square root transformation
Wisconsin double standardization
Run 0 stress 0.0008337215 
Run 1 stress 0.001379944 
Run 2 stress 0.001709739 
Run 3 stress 0.0007277672 
... New best solution
... Procrustes: rmse 0.1037968  max resid 0.1843271 
Run 4 stress 0.06125386 
Run 5 stress 0.001599348 
Run 6 stress 9.723621e-05 
... New best solution
... Procrustes: rmse 0.08137382  max resid 0.1400455 
Run 7 stress 0.005033573 
Run 8 stress 0.001724613 
Run 9 stress 9.900323e-05 
... Procrustes: rmse 0.08519409  max resid 0.1338753 
Run 10 stress 0.001755498 
Run 11 stress 9.961223e-05 
... Procrustes: rmse 0.1026008  max resid 0.1591543 
Run 12 stress 0.001559953 
Run 13 stress 0.0002718111 
... Procrustes: rmse 0.03093071  max resid 0.06257083 
Run 14 stress 0.0005031003 
... Procrustes: rmse 0.1147307  max resid 0.1661313 
Run 15 stress 0.0009696094 
Run 16 stress 0.001690145 
Run 17 stress 9.854687e-05 
... Procrustes: rmse 0.09427951  max resid 0.1366878 
Run 18 stress 0.002123245 
Run 19 stress 0.001540382 
Run 20 stress 9.519968e-05 
... New best solution
... Procrustes: rmse 0.07870077  max resid 0.1100296 
*** No convergence -- monoMDS stopping criteria:
    14: no. of iterations >= maxit
     5: stress < smin
     1: stress ratio > sratmax
Square root transformation
Wisconsin double standardization
Run 0 stress 9.665879e-05 
Run 1 stress 9.53452e-05 
... New best solution
... Procrustes: rmse 0.1065693  max resid 0.1697169 
Run 2 stress 9.951807e-05 
... Procrustes: rmse 0.07469551  max resid 0.1115015 
Run 3 stress 9.745068e-05 
... Procrustes: rmse 0.07763297  max resid 0.1373501 
Run 4 stress 0.001067049 
Run 5 stress 4.372603e-05 
... New best solution
... Procrustes: rmse 0.1059044  max resid 0.1494925 
Run 6 stress 9.897749e-05 
... Procrustes: rmse 0.09481433  max resid 0.1718094 
Run 7 stress 7.18088e-05 
... Procrustes: rmse 0.1137924  max resid 0.1678205 
Run 8 stress 9.928715e-05 
... Procrustes: rmse 0.1261378  max resid 0.21985 
Run 9 stress 7.839463e-05 
... Procrustes: rmse 0.09973258  max resid 0.1540742 
Run 10 stress 9.605322e-05 
... Procrustes: rmse 0.1003967  max resid 0.1512087 
Run 11 stress 9.974885e-05 
... Procrustes: rmse 0.1121724  max resid 0.1756395 
Run 12 stress 9.687449e-05 
... Procrustes: rmse 0.1356559  max resid 0.1636638 
Run 13 stress 0.0003500257 
... Procrustes: rmse 0.114241  max resid 0.1680756 
Run 14 stress 9.559932e-05 
... Procrustes: rmse 0.1003592  max resid 0.1588128 
Run 15 stress 9.828561e-05 
... Procrustes: rmse 0.1074659  max resid 0.1625574 
Run 16 stress 0.06811531 
Run 17 stress 9.253481e-05 
... Procrustes: rmse 0.1305726  max resid 0.2199027 
Run 18 stress 8.978303e-05 
... Procrustes: rmse 0.1136534  max resid 0.1680695 
Run 19 stress 0.0001783715 
... Procrustes: rmse 0.09088008  max resid 0.1386487 
Run 20 stress 0.000119464 
... Procrustes: rmse 0.1023554  max resid 0.1541552 
*** No convergence -- monoMDS stopping criteria:
     4: no. of iterations >= maxit
    15: stress < smin
     1: stress ratio > sratmax
Square root transformation
Wisconsin double standardization
Run 0 stress 0.0001587822 
Run 1 stress 0.1079485 
Run 2 stress 9.997663e-05 
... New best solution
... Procrustes: rmse 0.09287425  max resid 0.1641158 
Run 3 stress 9.810815e-05 
... New best solution
... Procrustes: rmse 0.02436755  max resid 0.03330625 
Run 4 stress 9.746947e-05 
... New best solution
... Procrustes: rmse 0.05873104  max resid 0.08644727 
Run 5 stress 9.713544e-05 
... New best solution
... Procrustes: rmse 0.05144213  max resid 0.07440652 
Run 6 stress 8.838194e-05 
... New best solution
... Procrustes: rmse 0.0876071  max resid 0.1438818 
Run 7 stress 9.933442e-05 
... Procrustes: rmse 0.1576782  max resid 0.2247313 
Run 8 stress 8.754791e-05 
... New best solution
... Procrustes: rmse 0.007195614  max resid 0.01129571 
Run 9 stress 0.001114504 
Run 10 stress 9.329577e-05 
... Procrustes: rmse 0.02776029  max resid 0.04415047 
Run 11 stress 0.002764123 
Run 12 stress 0.0008529207 
Run 13 stress 8.107075e-05 
... New best solution
... Procrustes: rmse 0.004852399  max resid 0.006437002 
... Similar to previous best
*** Solution reached
Square root transformation
Wisconsin double standardization
Run 0 stress 0.000413453 
Run 1 stress 0.09395731 
Run 2 stress 0.001073979 
Run 3 stress 0.006556755 
Run 4 stress 0.004028627 
Run 5 stress 0.09395733 
Run 6 stress 0.001003869 
Run 7 stress 0.0001767323 
... New best solution
... Procrustes: rmse 0.03265705  max resid 0.05066357 
Run 8 stress 0.001002464 
Run 9 stress 0.00400832 
Run 10 stress 0.004203716 
Run 11 stress 0.0001281946 
... New best solution
... Procrustes: rmse 0.02487193  max resid 0.04081672 
Run 12 stress 0.006458073 
Run 13 stress 0.006459056 
Run 14 stress 0.0002502822 
... Procrustes: rmse 0.04356955  max resid 0.07433534 
Run 15 stress 0.09395743 
Run 16 stress 0.005568887 
Run 17 stress 0.0004471322 
... Procrustes: rmse 0.01660123  max resid 0.02545122 
Run 18 stress 0.002520092 
Run 19 stress 0.00153722 
Run 20 stress 0.0007879612 
*** No convergence -- monoMDS stopping criteria:
    17: no. of iterations >= maxit
     3: stress ratio > sratmax
Square root transformation
Wisconsin double standardization
Run 0 stress 0.008008543 
Run 1 stress 0.02935725 
Run 2 stress 0.02935702 
Run 3 stress 0.007555719 
... New best solution
... Procrustes: rmse 0.0864895  max resid 0.152547 
Run 4 stress 0.02935706 
Run 5 stress 0.02935789 
Run 6 stress 0.008318727 
Run 7 stress 0.02935669 
Run 8 stress 0.02935759 
Run 9 stress 0.007547803 
... New best solution
... Procrustes: rmse 0.000216496  max resid 0.0003536255 
... Similar to previous best
Run 10 stress 0.008344215 
*** Solution reached
Square root transformation
Wisconsin double standardization
Run 0 stress 0.01811649 
Run 1 stress 0.01986519 
Run 2 stress 0.02006416 
Run 3 stress 0.01829596 
... Procrustes: rmse 0.01189568  max resid 0.02075114 
Run 4 stress 0.01986769 
Run 5 stress 0.02027452 
Run 6 stress 0.01811644 
... New best solution
... Procrustes: rmse 2.078353e-05  max resid 3.954208e-05 
... Similar to previous best
Run 7 stress 0.01811681 
... Procrustes: rmse 0.0001101164  max resid 0.0001867403 
... Similar to previous best
Run 8 stress 0.01852101 
... Procrustes: rmse 0.02171259  max resid 0.03888147 
Run 9 stress 0.06108471 
Run 10 stress 0.01950276 
*** Solution reached


Parameters used for the NMDS


OTUs
----------------------------

Call:
metaMDS(comm = otu.t, distance = "bray", k = 3, try = 10, display = c("sites"),      choices = c(1, 2), type = "t", shrink = FALSE) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(otu.t)) 
Distance: bray 

Dimensions: 3 
Stress:     9.519968e-05 
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(otu.t))' 

Phylum
----------------------------

Call:
metaMDS(comm = phylum.t, distance = "bray", k = 3, try = 10,      display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(phylum.t)) 
Distance: bray 

Dimensions: 3 
Stress:     4.372603e-05 
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(phylum.t))' 

Class
----------------------------

Call:
metaMDS(comm = class.t, distance = "bray", k = 3, try = 10, display = c("sites"),      choices = c(1, 2), type = "t", shrink = FALSE) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(class.t)) 
Distance: bray 

Dimensions: 3 
Stress:     8.107075e-05 
Stress type 1, weak ties
Two convergent solutions found after 13 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(class.t))' 

Order
----------------------------

Call:
metaMDS(comm = order.t, distance = "bray", k = 3, try = 10, display = c("sites"),      choices = c(1, 2), type = "t", shrink = FALSE) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(order.t)) 
Distance: bray 

Dimensions: 3 
Stress:     0.0001281946 
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(order.t))' 

Family
----------------------------

Call:
metaMDS(comm = family.t, distance = "bray", k = 3, try = 10,      display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(family.t)) 
Distance: bray 

Dimensions: 3 
Stress:     0.007547803 
Stress type 1, weak ties
Two convergent solutions found after 10 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(family.t))' 

Genus
----------------------------

Call:
metaMDS(comm = genus.t, distance = "bray", k = 3, try = 10, display = c("sites"),      choices = c(1, 2), type = "t", shrink = FALSE) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(genus.t)) 
Distance: bray 

Dimensions: 3 
Stress:     0.01811644 
Stress type 1, weak ties
Two convergent solutions found after 10 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'wisconsin(sqrt(genus.t))' 


Sherperd and non-metric multidimensional scaling plots.



Figure X. Sherperd and non-metric multidimensional scaling plot. Green oints represent samples and red points represent OTU or species. Similar samples are ordinated together. Stress values are shown at the botthom of ordination plot.



Phylogenetic clustering and tree annotation


Figure x: Sample Phylip or Newick-formatted tree clustered using the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) algorithm. Similar data was used to construct different types of tree including rectangular (A), circular (B) and unrooted (C). The tree is annotated with sequence depth, species richness, and phylum abundance as seen in (D).

Posible questions


Alpha diversity

  • QN1: Are the values obtained too sensitive to sampling?
  • QN2: Was the sampling effort sufficient to account for most OTUs present in a sample?
  • QN3: Is there a need to continue with re-sampling?
  • QN4: …….?


Beta diversity

  • QN1: …….?
  • QN2: …….?
  • QN3: …….?
  • QN4: …….?


More intervention by investigators


Summary of packages used in the analysis

R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux buster/sid

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] NbClust_3.0     reshape2_1.4.3  iNEXT_2.0.19    ggpubr_0.2     
 [5] magrittr_1.5    goeveg_0.4.2    vegan_2.5-4     permute_0.9-5  
 [9] funModeling_1.7 Hmisc_4.2-0     Formula_1.2-3   survival_2.43-3
[13] lattice_0.20-38 gridExtra_2.3   scales_1.0.0    ggplot2_3.1.0  

loaded via a namespace (and not attached):
 [1] maps_3.3.0          splines_3.5.3       dotCall64_1.0-0    
 [4] gtools_3.8.1        moments_0.14        assertthat_0.2.0   
 [7] latticeExtra_0.6-28 pander_0.6.3        yaml_2.2.0         
[10] corrplot_0.84       pillar_1.3.1        backports_1.1.3    
[13] glue_1.3.1          digest_0.6.18       RColorBrewer_1.1-2 
[16] checkmate_1.9.1     colorspace_1.4-1    cowplot_0.9.4      
[19] htmltools_0.3.6     Matrix_1.2-16       plyr_1.8.4         
[22] pkgconfig_2.0.2     purrr_0.3.2         gdata_2.18.0       
[25] htmlTable_1.13.1    tibble_2.1.1        mgcv_1.8-27        
[28] withr_2.1.2         ROCR_1.0-7          nnet_7.3-12        
[31] lazyeval_0.2.2      crayon_1.3.4        evaluate_0.13      
[34] nlme_3.1-137        MASS_7.3-51.1       gplots_3.0.1.1     
[37] foreign_0.8-71      tools_3.5.3         data.table_1.12.0  
[40] hms_0.4.2           stringr_1.4.0       munsell_0.5.0      
[43] cluster_2.0.7-1     entropy_1.2.1       compiler_3.5.3     
[46] caTools_1.17.1.2    rlang_0.3.1         rstudioapi_0.9.0   
[49] htmlwidgets_1.3     spam_2.2-2          bitops_1.0-6       
[52] base64enc_0.1-3     labeling_0.3        rmarkdown_1.12     
[55] gtable_0.2.0        R6_2.4.0            knitr_1.22         
[58] dplyr_0.8.0.1       readr_1.3.1         KernSmooth_2.23-15 
[61] stringi_1.4.3       parallel_3.5.3      Rcpp_1.0.1         
[64] fields_9.6          rpart_4.1-13        acepack_1.4.1      
[67] tidyselect_0.2.5    xfun_0.5